The body condition and growth of Eastern Baltic cod (Gadus morhua) has declined steadily since the regime shift in the early 1990’s to a degree that the stock now can be viewed as collapsed. Several hypotheses have been put forward, including changes in overlap with pelagic prey or lack of appropriately sized prey (e.g. Casini et al, 2016; Gårdmark et al, 2015), reduced oxygen levels decreasing habitat quality and leading to contraction of the distributional range thus increasing competition (e.g. Casini et al, 2016), increased competition for benthic food sources with flounder (Orio et al, 2019; Orio 2020) as well as increased intraspecific competition and growth bottlenecks within the population (Svedäng & Hornborg, 2014).
However, these potential explanatory variables have not been evaluated on a fine spatial scale, even though factors such as competition, food availability and local habitat quality likely occur on fine spatial scales. Instead, averages over larger spatial areas (e.g. ICES subdivisions) have been the variables in previous comparisons. Moreover, the ability of each of proposed explanatory variable (linked to a hypothesis) to explain variation in the condition of cod has not been compared in a standardized way, and has not been contrasted to residual spatial and spatiotemporal variation.
In this study I have compiled data for individual-level condition in the whole Baltic Sea (essentially everything south and east of Kattegatt) and matched that with predictor variables (on a haul level) representing different ecological hypotheses regarding drivers of variation in cod condition.
To account for spatial and temporal autocorrelation using data at this scale, we apply spatiotemporal predictive-process GLMMs using the R-package sdmTMB. This modeling framework allows evaluation of how much of the variation in condition can be explained by covariates, spatial (unmeasured variation in condition that is stable over time) and spatiotemporal variation (unmeasured variation in condition that changes between years).
In this script, the aim is to find a good default model explaining spatiotemporal variation in cod condition, without any of the suggested explanatory variables (the rest will be done in a different script). Then we show an example model with an additional explanatory variable. The project currently lives here. Details follow below.
In fishes, weight is typically assumed to vary log-normally around an average allometric function of length: \(w=al^b\), where \(w\) is weight in grams, \(l\) is length in cm, \(b\) is the allometric length exponent and \(a\) is the condition factor in unit \(g/L^b\) (Froese et al., 2014; Grüss et al., 2020). Typically this relationship is linearized by taking logs on both sides: \(\operatorname{log}(w)=a+b\operatorname{log}(l)\). Le Cren’s condition index is defined as the residuals from this length-weight relationship.
We model this individual-level relationship with a spatiotemporal GLMM of the form (minor deviations from this model are discussed in this document):
\[ \operatorname{log}(w_{s,t})\sim\operatorname{Student-t}(\mu_{s,t},\sigma,\nu)\\ \mu_{s,t} = \alpha_t + \beta_ddepth + \boldsymbol\omega_s + \boldsymbol\epsilon_{s,t} + \sum^{n_k}_{k=1}\boldsymbol\gamma_k\boldsymbol{X_k} + \beta \operatorname{log}(l), \]
where the degrees of freedom, \(\nu\), are set to 2 and \(\phi\) is the standard deviation. \(\alpha_t\) is a time-varying intercept:
\[ \alpha_t \sim \operatorname{Normal}(\alpha_{t-1}, \sigma^2_\gamma). \]
\(\beta_d\) is the coefficient for \(depth\), \(\boldsymbol\omega_s\) and \(\boldsymbol\epsilon_{s,t}\) represent spatial and spatiotemporal random effects, respectively. \(\boldsymbol{X_k}\) is a matrix of \(n_k\) measured additional covariates and \(\boldsymbol\gamma_k\) is the effect of the \(k\)-th additional covariate. \(\beta\) is the length-coefficient, corresponding to the allometric exponent \(b\). The spatial and spatiotemporal random effects are assumed to be drawn from a multivariate normal distribution:
\[ \boldsymbol\omega \sim \operatorname{MVNormal}(\boldsymbol0, \boldsymbol\Sigma_\omega)\\ \boldsymbol\epsilon_t \sim \operatorname{MVNormal}(\boldsymbol0, \boldsymbol\Sigma_\epsilon). \]
We also consider the spatiotemporal random effects to be drawn from a multivariate normal distribution following an AR1 process:
\[ \boldsymbol\delta_{t=1} \sim \operatorname{MVNormal}(\boldsymbol0, \boldsymbol\Sigma_\epsilon)\\ \boldsymbol\delta_{t>1} = \phi\boldsymbol\delta_{t-1} + \sqrt{1-\phi^2}\boldsymbol\epsilon_t, \boldsymbol\epsilon_t \sim \operatorname{MVNormal}(\boldsymbol0, \boldsymbol\Sigma_\epsilon). \]
In the spatial and spatiotemporal random fields, \(\Sigma_\omega\) and \(\Sigma_\epsilon\) are covariance matricies, where the covariance (\(\Phi(s, s')\)) between spatial points \(s\) and \(s'\) is given by a Matérn function:
\[ \Phi(s, s') = \tau^2/\Gamma(\nu)2^{\nu-1}(\kappa d_{jk})^{\nu}K_\nu(\kappa d_{jk}), \] where \(\tau^2\) is the spatial (marginal) variance.
This model (first equation) can be viewed as an approximation of Le Cren’s condition index (Grüss et al., 2020), as the log of the condition factor, i.e. \(\operatorname{log}(a)\) or the constant in the allometric relationship, can be defined as: \(\operatorname{log}(a) = \alpha_t + \boldsymbol\omega_s + \boldsymbol\epsilon_{s,t} + \sum^{n_k}_{k=1}\boldsymbol\gamma_k\boldsymbol{X_k}\). Thus, Eq. 1 is a model for a spatially and temporally varying condition factor.
The basic model structure above was determined after exploring different options, including different distributions, different degrees of freedom in the Student-t distribution, spatial trends, year as factor effects and # of knots (not shown here), and also based on the ecological question. For instance, depth is included to test if density variables have interactive effects with depth, which to extent is a proxy for benthic production (more on that in the model comparison script!). We no longer include sex as a covariate, and this is mainly because the predicted difference between males and females is very small (difference in the third decimal), and that un-sexed individuals are not in between the two estimates as would be expected but smaller.
Therefore, in this script we will mostly focus on different ways to model temporal changes (time varying intercept, year as factor or spatial trends), as well as comparing the support for autoregressive vs independent spatiotemporal fields and the inclusion of depth and how that affects other components of the model (mainly oxygen and spatial random fields).
After that we proceed with a final model and exemplify how models with additional covariates that are hypothesized to drive variation in the condition factor can be fitted.
The data are individual-level measurements of length and weight of Baltic cod in quarter 4 between the year 1991-2019, and stem from the Baltic International Trawl Survey (BITS) (can be downloaded from DATRAS). These data are cleaned and merged with additional covariate data in this script.
In this script I will only use oxygen concentration to illustrate adding covariates, but we also consider CPUE of cod and flounder as well as abundance of sprat and herring in this project (see separate script for more details). The covariates are standardized to have a mean of 0 and a standard deviation of 1, to allow for direct comparison with the spatial and spatiotemporal standard deviation, following Thorson (2015) and Grüss et al (2020).
library(tidyverse); theme_set(theme_classic())
library(tidylog)
library(viridis)
library(sdmTMB) # remotes::install_github("pbs-assess/sdmTMB")
library(marmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(RColorBrewer)
library(gganimate)
library(gifski)
library(png)
library(qwraps2) # To load entire cache in interactive r session, do: qwraps2::lazyload_cache_dir(path = "R/explore/condition_model_setup_cache/html")
# For adding maps to plots
world <- ne_countries(scale = "medium", returnclass = "sf")
# Specify map ranges
ymin = 54; ymax = 58; xmin = 9.5; xmax = 22
Now read data:
d <- readr::read_csv("https://raw.githubusercontent.com/maxlindmark/cod_condition/master/data/for_analysis/mdat_cond.csv")
# Calculate standardized variables
d <- d %>%
mutate(ln_length_cm = log(length_cm),
ln_weight_g = log(weight_g),
oxy_st = oxy,
oxy_rec_st = oxy_rec,
temp_st = temp,
temp_rec_st = temp_rec,
abun_her_st = abun_her,
abun_her_sd_st = abun_her_sd,
abun_spr_st = abun_spr,
abun_spr_sd_st = abun_spr_sd,
cpue_cod_st = cpue_cod,
cpue_cod_rec_st = cpue_cod_rec,
cpue_fle_st = cpue_fle,
cpue_fle_rec_st = cpue_fle_rec,
depth_st = depth) %>%
mutate_at(c("oxy_st", "oxy_rec_st", "temp_st", "temp_rec_st",
"abun_her_st", "abun_her_sd_st", "abun_spr_st", "abun_spr_sd_st",
"cpue_cod_st", "cpue_cod_rec_st", "cpue_fle_st", "cpue_fle_rec_st",
"depth_st"),
~(scale(.) %>% as.vector)) %>%
mutate(year = as.integer(year))
unique(is.na(d$abun_spr_st))
#> [1] FALSE TRUE
d %>% drop_na(abun_spr_st)
# almost 3k data points (3%) when dropping NA sprat... So we remove it?
Read the prediction grids:
pred_grid <- readr::read_csv("https://raw.githubusercontent.com/maxlindmark/cod_condition/master/data/for_analysis/pred_grid.csv")
pred_grid <- pred_grid %>%
mutate(ln_length_cm = log(1)) %>% # For now we'll predict changes in the intercept ("condition factor")
mutate(X = lon,
Y = lat,
year = as.integer(year),
depth_st = 0) %>% # In this prediction grid I keep depth at its mean, below I have a more realistic prediction grid
filter(year %in% c(unique(d$year)))
# And now read in pred_grid2 which in addition has oxygen and temperature values at location and time and depth:
pred_grid2 <- readr::read_csv("https://raw.githubusercontent.com/maxlindmark/cod_condition/master/data/for_analysis/pred_grid2.csv")
pred_grid2 <- pred_grid2 %>%
mutate(ln_length_cm = log(1)) %>% # For now we'll predict changes in the intercept ("condition factor")
mutate(X = lon, Y = lat, year = as.integer(year)) %>%
filter(year %in% c(unique(d$year))) %>%
mutate(depth_st = (depth - mean(d$depth))/sd(d$depth),
oxy_st = (oxy - mean(d$oxy))/sd(d$oxy)) # Need to scale these to the mean and sd in the data!
# Note that I'm not including temperature here, only oxygen
We can now plot the length-weight relationship as well as Fulton K condition factor to get a glimpse of what we might expect (but in the main model we use Le Cren’s condition index).
# First we can can make sure that the relationship between length and weight is in fact log-linear. This is a core assumption, and if it is not, the length-exponent may itself be dependent on size, in which case the condition factors will be biased.
ggplot(d, aes(ln_length_cm, ln_weight_g)) +
geom_point() +
stat_smooth()
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# Plot the distribution of data
ggplot(d, aes(Fulton_K)) + geom_histogram()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Plot "Fulton K" in space and time
d %>%
filter(Fulton_K < 2) %>% # remove the extremes for now
ggplot(., aes(lon, lat, color = Fulton_K)) +
geom_point(size = 1) +
facet_wrap(~ year, ncol = 5) +
scale_color_gradient2(midpoint = mean(d$Fulton_K)) +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax))
#> filter: removed 34 rows (<1%), 106,291 rows remaining
There is a clear temporal development in condition and the spatial coverage of data varies by year (fewer data in the beginning of the time series).
In earlier versions we used this: spde <- make_mesh(data = d, xy_cols = c("lon", "lat"), n_knots = 110, type = "kmeans", seed = 42). But now we have even more islands in the data since I am using also the western Baltic Sea, so we will use the add_barrier_mesh to include an island effect (following the example function). Note this also means we can increase the # of knots before hitting convergence issues (how much varies from model to model though!)
# Crop the polygon for plotting and efficiency:
baltic_coast <- suppressWarnings(suppressMessages(
st_crop(world,
c(xmin = xmin, ymin = ymin, xmax = xmax, ymax = ymax))))
crs <- 4326 # https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset#Common_EPSG_codes, WGS84
st_crs(baltic_coast) <- 4326 # 'WGS84'; necessary on some installs
baltic_coast <- st_transform(baltic_coast, crs)
# Project our survey data coordinates:
survey <- d %>% dplyr::select(lon, lat, ln_weight_g) %>%
st_as_sf(crs = 4326, coords = c("lon", "lat"))
# Plot our coast and survey data:
ggplot(baltic_coast) +
geom_sf() +
geom_sf(data = survey, size = 0.5)
# Prepare for making the mesh
# First, we will extract the coordinates:
surv_coords <- st_coordinates(survey)
spde <- make_mesh(d, xy_cols = c("lon", "lat"),
n_knots = 200, type = "kmeans", seed = 42)
plot(spde)
# Add on the barrier mesh component:
bspde <- add_barrier_mesh(
spde, baltic_coast, range_fraction = 0.2,
proj_scaling = 1, plot = TRUE
)
# In the above, the grey dots are the centre of triangles that are in the
# ocean. The red crosses are centres of triangles that are over land. The
# spatial range will be assumed to be 0.2 (`range_fraction`) over land compared
# to over water.
# We can make a more advanced plot if we want:
mesh_df_water <- bspde$mesh_sf[bspde$normal_triangles, ]
mesh_df_land <- bspde$mesh_sf[bspde$barrier_triangles, ]
ggplot(baltic_coast) +
geom_sf() +
geom_sf(data = mesh_df_water, size = 1, colour = "blue") +
geom_sf(data = mesh_df_land, size = 1, colour = "green")
# Now, when we fit our model with the new mesh, it will automatically
# include a barrier structure in the spatial correlation:
m1 <- sdmTMB(formula = ln_weight_g ~ ln_length_cm -1, time_varying = ~1, data = d, time = "year",
spde = bspde, family = student(link = "identity", df = 5), ar1_fields = FALSE,
include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = TRUE)
We can also consider the following models, with a spatial trend (to capture the larger decline in condition in the eestern Baltic), or the simple approach with a year factor instead of a random walk.
# extra: spatial trends
m_trends <- sdmTMB(formula = ln_weight_g ~ ln_length_cm -1, data = d, time = "year",
spde = bspde, family = student(link = "identity", df = 5), ar1_fields = FALSE,
include_spatial = TRUE, spatial_trend = TRUE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = TRUE)
# extra: factor year
m_year <- sdmTMB(formula = ln_weight_g ~ ln_length_cm + factor(year) -1, data = d, time = "year",
spde = bspde, family = student(link = "identity", df = 5), ar1_fields = FALSE,
include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = TRUE)
Here we compare the fixed year effect further with the random walk estimates. (Further down we make a prediction of the whole condition factor - here we just compare the intercepts):
random_walk <- data.frame(summary(TMB::sdreport(m1$tmb_obj))) %>%
rownames_to_column("param") %>%
filter(stringr::str_detect(param, 'b_rw_t')) %>%
rename("se" = "Std..Error") %>%
mutate(year = sort(unique(d$year)),
upr = Estimate + 1.96*se,
lwr = Estimate - 1.96*se,
model = "random walk")
factor_year <- data.frame(summary(TMB::sdreport(m_year$tmb_obj))) %>%
rownames_to_column("param") %>%
filter(stringr::str_detect(param, 'b_j.')) %>% # These are the intercepts, the length-coefficient is b_j
rename("se" = "Std..Error") %>%
mutate(year = sort(unique(d$year)),
upr = Estimate + 1.96*se,
lwr = Estimate - 1.96*se,
model = "factor(year)")
dodge <- position_dodge(width = 0.3)
bind_rows(random_walk, factor_year) %>%
ggplot(., aes(year, Estimate, color = model, shape = model)) +
ylab("Intercepts") +
geom_point(size = 3, position = dodge) +
geom_errorbar(aes(x = year, ymax = upr, ymin = lwr),
width = 0.2, position = dodge, alpha = 0.8) +
theme(axis.text.x = element_text(angle = 30),
legend.position = c(0.8, 0.8)) +
scale_color_brewer(palette = "Dark2") +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "Year") +
NULL
The random walk model estimates less variable condition (shrinks the estimates), and there seems to be a kind of plateau starting in 2004 following a more steep annual decline.
And here we will just show the spatial trend prediction:
p_trends <- predict(m_trends, newdata = pred_grid)
# Replace too-deep predictions with NA
p_trends <- p_trends %>% mutate(zeta_s2 = ifelse(depth > 120, NA, zeta_s)) # spatial trend
ggplot(filter(p_trends, year == 2000), aes(lon, lat, fill = zeta_s2)) +
geom_raster() +
scale_fill_viridis_c() +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Spatial trends")
This plot shows that there are some east-west spatial differences in the change in condition. Despite this spatial trend, we will here proceed with the random walk on intercepts models.
# Model 1 with AR1 spatiotemporal random fields
m2 <- sdmTMB(formula = ln_weight_g ~ ln_length_cm -1, time_varying = ~1, data = d, time = "year",
spde = bspde, family = student(link = "identity", df = 5), ar1_fields = TRUE,
include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = TRUE)
Look at the residuals:
df <- d
df$residuals_m1 <- residuals(m1)
df$residuals_m2 <- residuals(m2)
qqnorm(df$residuals_m1); abline(a = 0, b = 1)
qqnorm(df$residuals_m2); abline(a = 0, b = 1)
# They look *very* similar, plot against each other
plot(df$residuals_m1 ~ df$residuals_m2)
We can also check the AR1 parameter (rho is ar_phi on the -1 to 1 scale):
tidy(m2, effects = "ran_pars", conf.int = TRUE) %>% filter(term == "rho")
#> filter: removed 6 rows (86%), one row remaining
The AR1 confidence interval does not overlap 0, which indicates we should keep it like that (FYI when we set spatial_trend = TRUE, the AR1 correlation parameter becomes very small!). We can also Compare their AICs:
AIC(m1)
#> [1] -168014.9
AIC(m2)
#> [1] -168029.5
We will continue with the AR1 spatiotemporal fields. Plot the residuals also against predictions and then over length:
pred_m2 <- predict(m2)
df$pred_m2 <- pred_m2$est
# Residuals vs predicted
ggplot(df, aes(pred_m2, residuals_m2)) +
geom_point(alpha = 0.1, color = "grey20") +
geom_abline(color = "red", slope = 0, linetype = 2) +
geom_smooth()
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# Residuals vs length
ggplot(df, aes(ln_length_cm, residuals_m2)) +
geom_point(alpha = 0.1, color = "grey20") +
geom_abline(color = "red", slope = 0, linetype = 2) +
geom_smooth()
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
They look a little bit funny at small lengths (and small predicted weights), and worrisome in that there seems to be a peak in the middle of the predicted data (and length). I think that is because a) the small ones are quite rare and b) data are in cm size-classes (a bit coarse for a ~5 cm fish, hence many are assigned the same length). The tendency for positive residuals at around ~30 cm could be related to changes in the diet (e.g. inclusion of herring).
We can also plot the residuals by year and size:
colourCount = length(unique(d$year))
getPalette = colorRampPalette(brewer.pal(9, "RdYlBu"))
ggplot(df, aes(ln_length_cm, residuals_m2)) +
geom_point(alpha = 0.1, color = "grey20") +
geom_abline(color = "red", slope = 0, linetype = 2) +
geom_smooth(data = df, aes(ln_length_cm, residuals_m2, group = year, color = factor(year)),
inherit.aes = FALSE, se = FALSE) +
scale_color_manual(values = getPalette(colourCount))
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
At least the gam (geom_smooth) shows a diverging pattern for fish larger than 60 cm with a clear temporal pattern. Before 2004, residuals are negative, such that cod observed sizes are smaller than predicted, whereas after 2004 it’s opposite. This is opposite from what I expected, which is that we would overestimate the weight given a specific length in later years if the condition has declined faster in the larger fish. Could this have something to do with the sizes sampled? We do see that we have longer fish early the time series.
ggplot(d, aes(ln_length_cm, fill = factor(year), color = factor(year))) +
geom_density(alpha = 0.1, fill = NA) +
scale_color_manual(values = getPalette(colourCount))
# We can also look at the residuals in detail and not stare too much on the mean gam fit
ggplot(df, aes(ln_length_cm, residuals_m2)) +
geom_point(alpha = 0.5, color = "grey20") +
geom_abline(color = "red", slope = 0, linetype = 2) +
geom_smooth(data = df, aes(ln_length_cm, residuals_m2, group = year, color = factor(year)),
inherit.aes = FALSE) +
scale_color_manual(values = getPalette(colourCount)) +
coord_cartesian(xlim = c(4.1, 4.8)) +
facet_wrap(~year, ncol = 5) +
guides(color = FALSE, fill = FALSE)
#> `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
# Can also plot histograms
df %>% filter(ln_length_cm > 4) %>% # This is the size where they are diverging
ggplot(., aes(residuals_m2)) +
geom_histogram() +
facet_wrap(~year, ncol = 5, scales = "free_y") +
geom_vline(color = "red", xintercept = 0, linetype = 2) +
ggtitle("distributions of residuals for cod >60 cm")
#> filter: removed 99,040 rows (93%), 7,285 rows remaining
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In the last plots, the residuals length*year pattern is not as clear as in the first glance of the plot with the smooth curves (which does not tell us about the uncertainty.
We can also check the residuals on a map:
ggplot(df, aes(lon, lat, colour = residuals_m2)) +
geom_point(size = 0.5) +
facet_wrap(~year, ncol = 5) +
scale_color_gradient2()
Maybe some clustering remains… But overall OK I think!
Now that we have random structure, I will refit it with reml=FALSE, because from now on we will only change and compare models with different fixed effects. For the base model, we can consider depth. Depth is an interesting variable, because in the eastern Baltic sea where we have a stronger decline in condition, it is also deeper. Deep areas are generally low in oxygen, but also shallow areas can be that. However, in the eastern Baltic sea, there are no benthic prey for cod besides Saduria entomon, and they do not occupy deep areas (Gogina et al., 2020). and because saduria no not live in the western Baltic sea, we could not include that as a variable directly but would instead have to use a proxy for many different benthic species. Hence, depth can in the whole study area be seen as a proxy for availability of benthic food. First, plot the relationship between oxygen and depth, to make sure they are not too correlated:
ggplot(d, aes(depth_st, oxy_st)) +
geom_point() +
annotate("text", label = paste("correlation coefficient = ", round(cor(d$oxy_st, d$depth_st), digits = 2), sep = ""),
x = -Inf, y = -Inf, hjust = -0.1, vjust = -1)
Now compare the chosen model with the model that in addition has depth (both are fitted with reml = FALSE so that we can use AIC)
# Add depth to model 2
m2c <- sdmTMB(formula = ln_weight_g ~ ln_length_cm + depth_st -1, time_varying = ~ 1, data = d, time = "year",
spde = bspde, family = student(link = "identity", df = 5), ar1_fields = TRUE,
include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = FALSE)
#> Warning: The model may not have converged. Maximum final gradient:
#> 0.0119542242049562.
Compare the models using AIC:
AIC(m2, m2c) %>% arrange(AIC)
Much better to also include depth! Check the parameter estimates…
print(m2c)
#> Spatial model fit by ML ['sdmTMB']
#> Formula: ln_weight_g ~ ln_length_cm + depth_st - 1
#> Time column: "year"
#> SPDE: bspde
#> Data: d
#> Family: student(link = 'identity')
#> coef.est coef.se
#> ln_length_cm 2.98 0
#> depth_st -0.03 0
#>
#> Time-varying parameters:
#> coef.est coef.se
#> (Intercept)-1993 -4.47 0.01
#> (Intercept)-1994 -4.51 0.01
#> (Intercept)-1995 -4.52 0.01
#> (Intercept)-1996 -4.47 0.01
#> (Intercept)-1997 -4.53 0.01
#> (Intercept)-1998 -4.57 0.01
#> (Intercept)-1999 -4.57 0.01
#> (Intercept)-2000 -4.55 0.01
#> (Intercept)-2001 -4.57 0.01
#> (Intercept)-2002 -4.59 0.01
#> (Intercept)-2003 -4.56 0.01
#> (Intercept)-2004 -4.61 0.01
#> (Intercept)-2005 -4.63 0.01
#> (Intercept)-2006 -4.62 0.01
#> (Intercept)-2007 -4.62 0.01
#> (Intercept)-2008 -4.63 0.01
#> (Intercept)-2009 -4.64 0.01
#> (Intercept)-2010 -4.63 0.01
#> (Intercept)-2011 -4.64 0.01
#> (Intercept)-2012 -4.63 0.01
#> (Intercept)-2013 -4.64 0.01
#> (Intercept)-2014 -4.64 0.01
#> (Intercept)-2015 -4.63 0.01
#> (Intercept)-2016 -4.63 0.01
#> (Intercept)-2017 -4.65 0.01
#> (Intercept)-2018 -4.66 0.01
#> (Intercept)-2019 -4.64 0.01
#>
#> Matern range parameter: 0.51
#> Dispersion parameter: 0.09
#> Spatial SD (sigma_O): 0.00
#> Spatiotemporal SD (sigma_E): 0.00
#> Spatiotemporal AR1 correlation (rho): 0.21
#> ML criterion at convergence: -84141.722
rw_intercepts <- data.frame(summary(TMB::sdreport(m2c$tmb_obj))) %>%
rownames_to_column("param") %>%
filter(stringr::str_detect(param, 'b_rw_t'))
#> filter: removed 12,113 rows (>99%), 27 rows remaining
mean(rw_intercepts$Estimate)
#> [1] -4.595433
Now we can predict and plot estimates using all fixed and random effects on pre-made grid. This grid is created by doing an expand.grid over the survey range, then filtering out areas that are actually in the ocean using ICES shapefiles. Lastly some areas are too deep for sampling (-120 m). There is a depth column in the prediction grad and we can make those predictions NA so it is clear they are different from e.g. land and islands (and so that the color gradient is not going too far because of the extreme predictions at these depths).
p <- predict(m2c, newdata = pred_grid)
# Replace too-deep predictions with NA
p <- p %>% mutate(est2 = ifelse(depth > 120, NA, est), # prediction (fixed + random)
eps_st2 = ifelse(depth > 120, NA, epsilon_st), # spatiotemporal effects
omega_s2 = ifelse(depth > 120, NA, omega_s), # spatial random effect
est_non_rf2 = ifelse(depth > 120, NA, est_non_rf)) # fixed effects + random walk (everything not a random field)
Plot the predicted condition with fixed and random effects:
ggplot(p, aes(lon, lat, fill = est2)) +
geom_raster() +
facet_wrap(~year, ncol = 5) +
scale_fill_viridis(option = "magma",
name = "log(condition factor)") +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Prediction (random + fixed)")
Plot the spatiotemporal random effects:
ggplot(p, aes(lon, lat, fill = eps_st2)) +
geom_raster() +
facet_wrap(~year, ncol = 5) +
scale_fill_gradient2(name = "eps_st") +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Spatiotemporal random effects")
Plot the spatial random effects:
ggplot(filter(p, year == 2000), aes(lon, lat, fill = omega_s2)) +
geom_raster() +
scale_fill_gradient2(name = "omega_s") +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Spatial random effects")
#> filter: removed 124,350 rows (96%), 4,974 rows remaining
The spatial random field seems to largely follow depth (except in the most western part which is shallow but relatively low oxygen), which makes sense since it is constant across years and reflects e.g. oxygen concentration and overall habitat quality (deep areas are even anoxic).
Now we want to refit the same model with one of the additional fixed effects outlined above, to illustrate the suggested approach.
Here is an example of how the importance of additional covariates can be evaluated. To keep it simple we use one oxygen, modeled as a linear effect and with a spline because there might be a threshold effect of it on condition. The rest of this analysis (i.e. with all covariates) will be done in different rmarkdown file.
# Fit model with oxygen concentration as covariate
moxy <- sdmTMB(formula = ln_weight_g ~ ln_length_cm + depth_st + oxy_st -1, time_varying = ~ 1,
data = d, time = "year", spde = bspde, family = student(link = "identity", df = 5),
ar1_fields = TRUE, include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = FALSE)
#> Warning: The model may not have converged. Maximum final gradient:
#> 0.0753081063215149.
# And one with oxygen modeled as a spline
moxy_gam <- sdmTMB(formula = ln_weight_g ~ ln_length_cm + depth_st + s(oxy_st, k = 3) -1, time_varying = ~ 1,
data = d, time = "year", spde = bspde, family = student(link = "identity", df = 5),
ar1_fields = TRUE, include_spatial = TRUE, spatial_trend = FALSE, spatial_only = FALSE,
silent = TRUE, newton_steps = 1, reml = FALSE)
#> Warning: The model may not have converged. Maximum final gradient:
#> 0.0140502990034719.
# Check the models
print(moxy)
#> Spatial model fit by ML ['sdmTMB']
#> Formula: ln_weight_g ~ ln_length_cm + depth_st + oxy_st - 1
#> Time column: "year"
#> SPDE: bspde
#> Data: d
#> Family: student(link = 'identity')
#> coef.est coef.se
#> ln_length_cm 2.98 0
#> depth_st -0.02 0
#> oxy_st 0.01 0
#>
#> Time-varying parameters:
#> coef.est coef.se
#> (Intercept)-1993 -4.48 0.01
#> (Intercept)-1994 -4.51 0.01
#> (Intercept)-1995 -4.53 0.01
#> (Intercept)-1996 -4.47 0.01
#> (Intercept)-1997 -4.53 0.01
#> (Intercept)-1998 -4.57 0.01
#> (Intercept)-1999 -4.57 0.01
#> (Intercept)-2000 -4.55 0.01
#> (Intercept)-2001 -4.57 0.01
#> (Intercept)-2002 -4.59 0.01
#> (Intercept)-2003 -4.56 0.01
#> (Intercept)-2004 -4.61 0.01
#> (Intercept)-2005 -4.63 0.01
#> (Intercept)-2006 -4.62 0.01
#> (Intercept)-2007 -4.62 0.01
#> (Intercept)-2008 -4.63 0.01
#> (Intercept)-2009 -4.64 0.01
#> (Intercept)-2010 -4.63 0.01
#> (Intercept)-2011 -4.64 0.01
#> (Intercept)-2012 -4.63 0.01
#> (Intercept)-2013 -4.64 0.01
#> (Intercept)-2014 -4.64 0.01
#> (Intercept)-2015 -4.63 0.01
#> (Intercept)-2016 -4.63 0.01
#> (Intercept)-2017 -4.65 0.01
#> (Intercept)-2018 -4.66 0.01
#> (Intercept)-2019 -4.64 0.01
#>
#> Matern range parameter: 0.49
#> Dispersion parameter: 0.09
#> Spatial SD (sigma_O): 0.00
#> Spatiotemporal SD (sigma_E): 0.00
#> Spatiotemporal AR1 correlation (rho): 0.21
#> ML criterion at convergence: -84156.335
print(moxy_gam)
#> Spatial model fit by ML ['sdmTMB']
#> Formula: ln_weight_g ~ ln_length_cm + depth_st + s(oxy_st, k = 3) - 1
#> Time column: "year"
#> SPDE: bspde
#> Data: d
#> Family: student(link = 'identity')
#> coef.est coef.se
#> ln_length_cm 2.98 0
#> depth_st -0.02 0
#> s(oxy_st).1 0.00 0
#> s(oxy_st).2 0.01 0
#>
#> Time-varying parameters:
#> coef.est coef.se
#> (Intercept)-1993 -4.48 0.01
#> (Intercept)-1994 -4.51 0.01
#> (Intercept)-1995 -4.53 0.01
#> (Intercept)-1996 -4.47 0.01
#> (Intercept)-1997 -4.53 0.01
#> (Intercept)-1998 -4.57 0.01
#> (Intercept)-1999 -4.57 0.01
#> (Intercept)-2000 -4.55 0.01
#> (Intercept)-2001 -4.57 0.01
#> (Intercept)-2002 -4.59 0.01
#> (Intercept)-2003 -4.56 0.01
#> (Intercept)-2004 -4.61 0.01
#> (Intercept)-2005 -4.63 0.01
#> (Intercept)-2006 -4.62 0.01
#> (Intercept)-2007 -4.62 0.01
#> (Intercept)-2008 -4.63 0.01
#> (Intercept)-2009 -4.64 0.01
#> (Intercept)-2010 -4.63 0.01
#> (Intercept)-2011 -4.64 0.01
#> (Intercept)-2012 -4.63 0.01
#> (Intercept)-2013 -4.64 0.01
#> (Intercept)-2014 -4.64 0.01
#> (Intercept)-2015 -4.63 0.01
#> (Intercept)-2016 -4.63 0.01
#> (Intercept)-2017 -4.65 0.01
#> (Intercept)-2018 -4.66 0.01
#> (Intercept)-2019 -4.64 0.01
#>
#> Matern range parameter: 0.49
#> Dispersion parameter: 0.09
#> Spatial SD (sigma_O): 0.00
#> Spatiotemporal SD (sigma_E): 0.00
#> Spatiotemporal AR1 correlation (rho): 0.21
#> ML criterion at convergence: -84156.422
Check the new fixed effect estimates and their confidence interval:
tidy(moxy, conf.int = TRUE)
tidy(moxy_gam, conf.int = TRUE)
Plot marginal effects:
# Prepare prediction data frame
nd_oxy <- data.frame(oxy_st = seq(min(d$oxy_st), max(d$oxy_st), length.out = 100))
nd_oxy$year <- 2003L
nd_oxy$depth_st <- 0
nd_oxy$ln_length_cm <- 0
# Predict from linear oxygen model
p_margin_oxy <- predict(moxy, newdata = nd_oxy, se_fit = TRUE, re_form = NA)
# Predict from spline model
p_margin_oxy_gam <- predict(moxy_gam, newdata = nd_oxy, se_fit = TRUE, re_form = NA)
# Plot together:
oxy_preds <- bind_rows(mutate(p_margin_oxy, model = "linear"),
mutate(p_margin_oxy_gam, model = "gam"))
#> mutate: new variable 'model' (character) with one unique value and 0% NA
#> mutate: new variable 'model' (character) with one unique value and 0% NA
ggplot(oxy_preds, aes(oxy_st, est,
ymin = est - 1.96 * est_se, ymax = est + 1.96 * est_se, fill = model, color = model)) +
geom_ribbon(alpha = 0.4) + geom_line() +
scale_color_brewer(palette = "Set1") +
scale_fill_brewer(palette = "Set1") +
theme(legend.position = c(0.1, 0.9))
Does not look like a very strong support for a spline here. We can also compare their AICs:
AIC(m2c, moxy, moxy_gam) %>% arrange(AIC)
Now look more closely at the our estimates, specifically comparing the standard deviation of spatial and spatiotemporal variation with our coefficients for the standardized variables:
# Extract random and fixed coefficients from the oxygen model, bind rows
moxy_est <- bind_rows(tidy(moxy, effects = "ran_par", conf.int = TRUE) %>%
filter(term %in% c("sigma_O", "sigma_E")),
tidy(moxy, effects = "fixed", conf.int = TRUE) %>%
filter(term %in% c("oxy_st", "depth_st"))) %>%
mutate(model = "oxygen model\n(AIC selected)")
# ... And the default model
m2c_est <- bind_rows(tidy(m2c, effects = "ran_par", conf.int = TRUE) %>%
filter(term %in% c("sigma_O", "sigma_E")),
tidy(m2c, effects = "fixed", conf.int = TRUE) %>%
filter(term %in% c("depth_st"))) %>%
mutate(model = "default model")
coef_df <- bind_rows(moxy_est, m2c_est)
dodge <- position_dodge(width = 0.3)
ggplot(coef_df, aes(term, estimate, color = model, group = model)) +
geom_point(size = 2, position = dodge) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), position = dodge, width = 0.2) +
scale_color_brewer(palette = "Dark2") +
geom_hline(yintercept = 0, linetype = 2, color = "gray") +
ggtitle("Spatial and spatiotemporal variation vs covariates")
I interpret this as follows: The effect of oxygen is stronger now than in previous versions, and the magnitude is larger than that of spatial and spatiotemporal variation (mainly because we know use model-predicted oxygen instead of the nearest observed oxygen in-situ concentration which was inaccurate). Inclusion of the depth variable reduces the standard deviation of the spatial random field, and to a smaller degree also the spatiotemporal standard deviation. Including oxygen reduces the effect of depth (which at least partly is due to deep areas being low in oxygen).
We can also produce a map to look at the differences between the predictions across space and time from the default model and the model with the oxygen covariate (predicted at the mean oxygen levels, 0, in the latter case):
# Add in a fixed covariate here
pred_grid_oxy <- pred_grid
pred_grid_oxy$oxy_st <- 0 # Mean since standardized
poxy <- predict(moxy, newdata = pred_grid_oxy)
# Replace too-deep predictions with NA
poxy <- poxy %>% mutate(est2 = ifelse(depth > 120, NA, est))
poxy$est_ratio <- poxy$est2 / p$est2 # p being the prediction from the default model with depth, mdef2
ggplot(poxy, aes(X, Y, fill = est_ratio)) +
geom_raster() +
facet_wrap(~ year, ncol = 5) +
scale_fill_gradient2(midpoint = 1) +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Ratio of prediction (random + fixed) oxygen model at mean oxygen:default") +
labs(x = "lon", y = "lat")
Not surprisingly perhaps, the model predictions differ mostly in areas with low oxygen concentration (although the difference is small!).
We can also plot the same map with the oxygen values at the location:
poxy2 <- predict(moxy, newdata = pred_grid2)
# Replace too-deep predictions with NA
poxy2 <- poxy2 %>% mutate(est2 = ifelse(depth > 120, NA, est))
# Calculate ratio
poxy2$est_ratio <- poxy2$est2 / p$est2
#> Warning in poxy2$est2/p$est2: longer object length is not a multiple of shorter
#> object length
ggplot(poxy2, aes(X, Y, fill = est_ratio)) +
geom_raster() +
facet_wrap(~year, ncol = 5) +
scale_fill_gradient2(midpoint = 1) +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Ratio of prediction (random + fixed) oxygen model with covariate:default") +
labs(x = "lon", y = "lat")
The model with oxygen leads to smaller predictions in shallow areas and larger predictions in deeper areas, which is probably related to the fact the the inclusion of oxygen reduces the magnitude of the depth covariate.
Now plot the average condition factor across the survey area by year:
# From these models, predict annual condition factor
# Grabbing the number of cells to help with calculating the average
pred_grid2 <- pred_grid2 %>% drop_na(oxy_st)
ncells <- filter(pred_grid2, year == max(pred_grid2$year)) %>% nrow()
# Use the `area` argument here to turn the total into an average by giving it one over the number of cells
average_condition_moxy <- predict(moxy, newdata = pred_grid2, return_tmb_object = TRUE, area = 1/ncells)
# Make a little helper function... bias correction shouldn't do anything here because of the identity link
get_average_condition <- function(obj, level = 0.95, ...) {
sdmTMB:::get_generic(obj, value_name = "link_total",
bias_correct = FALSE, level = level, trans = I, ...)
}
avg_condition <- get_average_condition(average_condition_moxy)
ggplot(avg_condition, aes(year, est)) +
ylab("Average\nlog(condition factor)") +
geom_point(size = 3) +
geom_errorbar(aes(x = year, ymax = upr, ymin = lwr),
width = 0.2, position = dodge, alpha = 0.8) +
theme(axis.text.x = element_text(angle = 30),
legend.position = c(0.8, 0.8)) +
scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
labs(x = "Year") +
NULL
Lastly, we can make a prediction over a grid with the oxygen values values at each location across time pred_grid2:
# Replace too-deep predictions with NA
poxy2 <- poxy2 %>% mutate(est2 = ifelse(depth > 120, NA, est))
ggplot(poxy2, aes(X, Y, fill = est2)) +
geom_raster() +
facet_wrap(~year, ncol = 5) +
scale_fill_viridis(option = "magma",
name = "log(condition factor)") +
geom_sf(data = world, inherit.aes = F, size = 0.2) +
coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax)) +
ggtitle("Prediction (random + fixed)") +
labs(x = "lon", y = "lat")
Again, the story seems to be that condition is getting worse throughout the Baltic Sea and it’s already worse in deep areas, especially east of Bornholm and north of the Gdansk deep (see bathymetry map below), and maybe the time trend is also slightly stronger there!
# All defaults
pred_grid2 %>%
filter(year == 1999) %>%
filter(depth > 0) %>%
ggplot(., aes(lon, lat, fill = depth)) +
scale_fill_gradientn(colours = rev(terrain.colors(10)),
limits = c(min(drop_na(pred_grid2)$depth),
max(drop_na(pred_grid2)$depth))) +
geom_raster()
Anderson, S.C., Keppel, E.A., Edwards, A.M. 2019. A reproducible data synopsis for over 100 species of British Columbia groundfish. DFO Can. Sci. Advis. Sec. Res. Doc. 2019/041. vii + 321 p.
Casini, M., Käll, F., Hansson, M., Plikshs, M., Baranova, T., Karlsson, O., Lundström, K., Neuenfeldt, S., Gårdmark, A. and Hjelm, J., 2016. Hypoxic areas, density-dependence and food limitation drive the body condition of a heavily exploited marine fish predator. Royal Society open science, 3(10), p.160416.
Froese, R., Thorson, J.T. and Reyes Jr, R.B., 2014. A Bayesian approach for estimating length‐weight relationships in fishes. Journal of Applied Ichthyology, 30(1), pp.78-85.
Gogina, M., Zettler, M.L., Wåhlström, I., Andersson, H., Radtke, H., Kuznetsov, I. and MacKenzie, B.R., 2020. A combination of species distribution and ocean-biogeochemical models suggests that climate change overrides eutrophication as the driver of future distributions of a key benthic crustacean in the estuarine ecosystem of the Baltic Sea. ICES Journal of Marine Science, 77(6), pp.2089-2105.
Grüss, A., Gao, J., Thorson, J.T., Rooper, C.N., Thompson, G., Boldt, J.L. and Lauth, R., 2020. Estimating synchronous changes in condition and density in eastern Bering Sea fishes. Marine Ecology Progress Series, 635, pp.169-185.
Gårdmark, A., Casini, M., Huss, M., van Leeuwen, A., Hjelm, J., Persson, L. and de Roos, A.M., 2015. Regime shifts in exploited marine food webs: detecting mechanisms underlying alternative stable states using size-structured community dynamics theory. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1659), p.20130262.
Neuenfeldt, S., Bartolino, V., Orio, A., Andersen, K.H., Andersen, N.G., Niiranen, S., Bergström, U., Ustups, D., Kulatska, N. and Casini, M., 2020. Feeding and growth of Atlantic cod (Gadus morhua L.) in the eastern Baltic Sea under environmental change. ICES Journal of Marine Science, 77(2), pp.624-632.
Orio, A., Bergström, U., Florin, A-B., Lehmann, A., Šics, I. and Casini, M., 2019. Spatial contraction of demersal fish populations in a large marine ecosystem. Journal of Biogeography, 46(3), pp.633-645.
Orio, A., Bergström, U., Florin, A-B., Šics, I. and Casini, M., 2020. Long-term changes in spatial overlap between interacting cod and flounder in the Baltic Sea. Hydrobiologia, 847(11), pp.2541-2553.
Svedäng, H. and Hornborg, S., 2014. Selective fishing induces density-dependent growth. Nature communications, 5(1), pp.1-6.
Thorson, J.T., 2015. Spatio-temporal variation in fish condition is not consistently explained by density, temperature, or season for California Current groundfishes. Marine Ecology Progress Series, 526, pp.101-112.